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Abstract —Convex optimization is a powerful tool for resource 
allocation and signal processing in wireless networks. As the 
network density is expected to drastically increase in order to 
accommodate the exponentially growing mobile data traffic, per¬ 
formance optimization problems are entering a new era charac¬ 
terized by a high dimension and/or a large number of constraints, 
which poses significant design and computational challenges. In 
this paper, we present a novel two-stage approach to solve large- 
scale convex optimization problems for dense wireless cooperative 
networks, which can effectively detect Infeasiblllty and enjoy 
modeling flexibility. In the proposed approach, the original 
large-scale convex problem is transformed into a standard cone 
programming form in the first stage via matrix stuffing, which 
only needs to copy the problem parameters such as channel state 
information (CSI) and quallty-of-servlce (QoS) requirements to 
the pre-stored structure of the standard form. The capability of 
yielding infeasibility certificates and enabling parallel computing 
is achieved by solving the homogeneous self-dual embedding of 
the primal-dual pair of the standard form. In the solving stage, 
the operator splitting method, namely, the alternating direction 
method of multipliers (ADMM), is adopted to solve the large-scale 
homogeneous self-dual embedding. Compared with second-order 
methods, ADMM can solve large-scale problems in parallel with 
modest accuracy within a reasonable amount of time. Simulation 
results will demonstrate the speedup, scalability, and reliability 
of the proposed framework compared with the state-of-the-art 
modeling frameworks and solvers. 

Index Terms —Dense wireless networking, large-scale opti¬ 
mization, matrix stuffing, operator splitting method, ADMM, 
homogeneous self-dual embedding. 


I. Introduction 

T he proliferation of smart mobile devices, coupled with 
new types of wireless applications, has led to an expo¬ 
nential growth of wireless and mobile data traffic. In order to 
provide high-volume and diversified data services, ultra-dense 
wireless cooperative network architectures have been proposed 
for next generation wireless networks HI, e.g., Cloud-RAN 
m, Q, and distributed antenna systems Q. To enable efficient 
interference management and resource allocation, large-scale 
multi-entity collaboration will play pivotal roles in dense wire¬ 
less networks. For instance, in Cloud-RAN, all the baseband 
signal processing is shifted to a single cloud data center with 
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very powerful computational capability. Thus the centralized 
signal processing can be performed to support large-scale 
cooperative transmission/reception among the radio access 
units (RAUs). 

Convex optimization serves as an indispensable tool for 
resource allocation and signal processing in wireless com¬ 
munication systems 0, 0, o. For instance, coordinated 
beamforming 0 often yields a direct convex optimization 
formulation, i.e., second-order cone programming (SOCP) 0. 
The network max-min fairness rate optimization cni can 
be solved through the bi-section method 0 in polynomial 
time, wherein a sequence of convex subproblems are solved. 
Furthermore, convex relaxation provides a principled way of 
developing polynomial-time algorithms for non-convex or NP- 
hard problems, e.g., group-sparsity penalty relaxation for the 
NP-hard mixed integer nonlinear programming problems 0, 
semidefinite relaxation 0 for NP-hard robust beamforming 
M, na and multicast beamforming 03, and sequential 
convex approximation to the highly intractable stochastic 
coordinated beamforming IITtII . 

Nevertheless, in dense wireless cooperative networks 0, 
which may possibly need to simultaneously handle hundreds 
of RAUs, resource allocation and signal processing problems 
will be dramatically scaled up. The underlying optimization 
problems will have high dimensions and/or large numbers 
of constraints (e.g., per-RAU transmit power constraints and 
per-MU (mobile user) QoS constraints). For instance, for a 
Cloud-RAN with 100 single-antenna RAUs and 100 single¬ 
antenna MUs, the dimension of the aggregative coordinated 
beamforming vector (i.e., the optimization variables) will be 
10''^. Most advanced off-the-shelf solvers (e.g., SeDuMi ifTsll . 
SDPT3 ifTbll and MOSEK ifTTll ) are based on the interior- 
point method. However, the computational burden of such 
second-order method makes it inapplicable for large-scale 
problems. For instance, solving convex quadratic programs 
has cubic complexity mi. Furthermore, to use these solvers, 
the original problems need to be transformed to the standard 
forms supported by the solvers. Although the parser/solver 
modeling frameworks like CVX 1191 and YALMIP lIMIl can 
automatically transform the original problem instances into 
standard forms, it may require substantial time to perform 
such transformation im, especially for problems with a large 
number of constraints ll22ll . 

One may also develop custom algorithms to enable efficient 
computation by exploiting the structures of specific problems. 
For instance, the uplink-downlink duality 0 is exploited to 
extract the structures of the optimal beamformers ll23 and 
enable efficient algorithms. However, such an approach still 
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has the cubic complexity to perform matrix inversion at each 
iteration 12^ . First-order methods, e.g., the ADMM algorithm 
1251 . have recently attracted attention for their distributed and 
parallelizable implementation, as well as the capability of 
scaling to large problem sizes. However, most existing ADMM 
based algorithms cannot provide the certificates of infeasibility 
ED, 121, 1261. Furthermore, some of them may still fail to 
scale to large problem sizes, due to the SOCP subproblems 
l2^ or semidefinite programming (SDP) subproblems d 
needed to be solved at each iteration. 

Without efficient and scalable algorithms, previous studies 
of wireless cooperative networks either only demonstrate 
performance in small-size networks, typically with less than 
10 RAUs, or resort to sub-optimal algorithms, e.g., zero¬ 
forcing based approaches d, d- Meanwhile, from the 
above discussion, we see that the large-scale optimization 
algorithms to be developed should possess the following two 
features: 

• To scale well to large problem sizes with parallel com¬ 
puting capability; 

• To effectively detect problem infeasibility, i.e., provide 
certificates of infeasibility. 

To address these two challenges in a unified way, in this 
paper, we shall propose a two-stage approach as shown in 
Fig. m The proposed framework is capable to solve large- 
scale convex optimization problems in parallel, as well as 
providing certificates of infeasibility. Specifically, the original 
problem will be first transformed into a standard cone 
programming form ^cone ifTSl based on the Smith form 
reformulation ||2^ . via introducing a new variable for each 
subexpression in the disciplined convex programming form 
ll^ of the original problem. This will eventually transform the 
coupled constraints in the original problem into the constraint 
only consisting of two convex sets: a subspace and a convex set 
formed by a Cartesian product of a finite number of standard 
convex cones. Such a structure helps to develop efficient 
parallelizable algorithms and enable the infeasibility detection 
capability simultaneously via solving the homogeneous self¬ 
dual embedding OTIl of the primal-dual pair of the standard 
form by the ADMM algorithm. 

As the mapping between the standard cone program and 
the original problem only depends on the network size (i.e., 
the numbers of RAUs, MUs and antennas at each RAU), 
we can pre-generate and store the structures of the standard 
forms with different candidate network sizes. Then for each 
problem instance, that is, given the channel coefficients, QoS 
requirements, and maximum RAU transmit powers, we only 
need to copy the original problem parameters to the standard 
cone programming data. Thus, the transformation procedure 
can be very efficient and can avoid repeatedly parsing and 
re-generating problems m, EQi. This technique is called 
matrix stuffing ||2TI| . Il22l . which is essential for the proposed 
framework to scale well to large problem sizes. It may also 
help rapid prototyping and testing for practical equipment 
development. 



Fig. 1. The proposed two-stage approach for large-scale convex optimization. 
The optimal solution or the certificate of infeasibility can be extracted from 
X* by the ADMM solver. 

A. Contributions 

The major contributions of the paper are summarized as 
follows: 

1) We formulate main performance optimization problems 
in dense wireless cooperative networks into a general 
framework. It is shown that all of them can essentially be 
solved through solving one or a sequence of large-scale 
convex optimization or convex feasibility problems. 

2) To enable both the infeasibility detection capability and 
parallel computing capability, we propose to transform 
the original convex problem to an equivalent standard 
cone program. The transformation procedure scales very 
well to large problem sizes with the matrix stuffing 
technique. Simulation results will demonstrate the ef¬ 
fectiveness of the proposed fast transformation approach 
over the state-of-art parser/solver modeling frameworks. 

3) The operator splitting method is then adopted to solve 
the large-scale homogeneous self-dual embedding of the 
primal-dual pair of the transformed standard cone pro¬ 
gram in parallel. This first-order optimization algorithm 
makes the second stage scalable. Simulation results will 
show that it can speedup several orders of magnitude 
over the state-of-art interior-point solvers. 

4) The proposed framework enables evaluating various 
cooperation strategies in dense wireless networks, and 
helps reveal new insights numerically. For instance, 
simulation results demonstrate a significant performance 
gain of optimal beamforming over sub-optimal schemes, 
which shows the importance of developing large-scale 
optimal beamforming algorithms. 

This work will serve the purpose of providing practical 
and theoretical guidelines on designing algorithms for generic 
large-scale optimization problems in dense wireless networks. 

B. Organization 

The remainder of the paper is organized as follows. Section 
im presents the system model and problem formulations. In 
Section imi a systematic cone programming form transforma¬ 
tion procedure is developed. The operator splitting method is 
presented in Section |IV] The practical implementation issues 
are discussed in Section |V] Numerical results will be demon¬ 
strated in Section |VT] Finally, conclusions and discussions are 
presented in Section Ivnl To keep the main text clean and free 
of technical details, we divert most of the proofs, derivations 
to the appendix. 

II. Large-Scale Optimization in Dense wireless 
Cooperative Networks 

In this section, we will first present two representative 
optimization problems in wireless cooperative networks, i.e., 
the network power minimization problem and the network 
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Fig. 2. The architecture of Cloud-RAN, in which, all the RAUs are connected 
to a BBU pool through high-capacity and low-latency optical fronthaul links. 
To enable full cooperation among RAUs, it is assumed that all the user data 
and CSI are available at the BBU pool. 


utility maximization problem. We will then provide a unified 
formulation for large-scale optimization problems in dense 
wireless cooperative networks. 


A. Signal Model 

Consider a dense fully cooperative networlf] with L RAUs 
and K single-antenna MUs, where the /-th RAU is equipped 
with Ni antennas. The centralized signal processing is per¬ 
formed at a central processor, e.g., the baseband unit pool 
in Cloud-RAN ||2l, ||3 as shown in Fig. |2l The propagation 
channel from the Z-th RAU to the fc-th MU is denoted as 
hfci € C^‘,Vfc,/. In this paper, we focus on the downlink 
transmission for illustrative purpose. But our proposed ap¬ 
proach can also be applied in the uplink transmission, as we 
only need to exploit the convexity of the resulting performance 
optimization problems. The received signal € C at MU k 
is given by 

L L 

Uk = 4- -f nk,yk, (1) 

Z—1 i^k 1—1 


where Sk is the encoded information symbol for MU k with 
IE[|sfep] = 1, vjfc G C'^‘ is the transmit beamforming vector 
from the Uth RAU to the fc-th MU, and Uk ^ CAf(0,a^) is 
the additive Gaussian noise at MU fc. We assume that s^’s and 
rifc’s are mutually independent and all the users apply single 
user detection. Thus the signaTto-interference-plus-noise ratio 
(SINR) of MU fc is given by 


TfeM = 


IhfcVfcl 




,Vfc, 


( 2 ) 


^The full cooperation among all the RAUs with global CSI and full user 
data sharing is used as an illustrative example. The proposed framework can 
be extended to more general cooperation scenarios, e.g., with partial user data 
sharing among RAUs as presented in (3 Section 1.3.1]. 


where ..., G with N = Ya^-^Ni, 

Vfe = [vf,, ..., G and V 4 [vf,..., G 

We assume that each RAU has its own power constraint, 

K 

J2\\vik\\l<Pi,yi, (3) 

k=l 

where Pi > 0 is the maximum transmit power of the Tth 
RAU. In this paper, we assume that the full and perfect CSI is 
available at the central processor and all RAUs only provide 
unicast/broadcast services. 

B. Network Power Minimization 

Network power consumption is an important performance 
metric for the energy efficiency design in wireless cooperative 
networks. Coordinated beamforming is an efficient way to 
design energy-efficient systems 0 , in which, beamforming 
vectors v/^’s are designed to minimize the total transmit power 
among RAUs while satisfying the QoS requirements for all the 
MUs. Specifically, given the target SINRs 7 = ( 71 ,... , 7 if) 
for all the MUs with 7 ^ > 0, Vfc, we will solve the following 
total transmit power minimization problem: 

L K 

: minimize 

i=i k=i 

where V is the intersection of the sets formed by transmit 
power constraints and QoS constraints, i.e., 

v = iPiniP 2 n---nT’i,nQinQ 2 -.-,nQK, (5) 

where Vi ’s are feasible sets of v that satisfy the per-RAU 
transmit power constraints, i.e., 

= ( 6 ) 

and Qfc’s are the feasible sets of v that satisfy the per-MU 
QoS constraints, i.e., 

Qfc = {vGC'^'^:rfc(v)>74,Vfc. (7) 

As all the sets Q^’s and P/’s can be reformulated into 
second-order cones as shown in El, problem 4 ^ 1 ( 7 ) can be 
reformulated as an SOCP problem. 

However, in dense wireless cooperative networks, the mo¬ 
bile hauling network consumption can not be ignored. In lO, 
a two-stage group sparse beamforming (GSBF) framework 
is proposed to minimize the network power consumption for 
Cloud-RAN, including the power consumption of all optical 
fronthaul links and the transmit power consumption of all 
RAUs. Specially, in the first stage, the group-sparsity structure 
of the aggregated beamformer v is induced by minimizing the 
weighted mixed ^i/£ 2 -norm of v, i.e., 

L 

,^2(7) : minimize y^a;/||v/||2, (8) 

vGV ^' 

where V/ = [v^,...,v^]^ G is the aggregated 

beamforming vector at RAU I, and tu/ > 0 is the corresponding 
weight for the beamformer coefficient group v/. Based on 
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the (approximated) group sparse beamformer v*, which is 
the optimal solution to .^^ 2 ( 7 ), in the second stage, an RAU 
selection procedure is performed to switch off some RAUs 
so as to minimize the network power consumption. In this 
procedure, we need to check if the remaining RAUs can 
support the QoS requirements for all the MUs, i.e., check the 
feasibility of problem ^i('y) given the active RAUs. Please 
refer to 13 for more details on the group sparse beamforming 
algorithm. 

C. Network Utility Maximization 
Network utility maximization is a general approach to 
optimize network performance. We consider maximizing an 
arbitrary network utility function U{ri{v),... ,r k{'v)) that 
is strictly increasing in the SINR of each MU Q, i.e., 

,^3 : maximize U{Ti{v),... ,Tk{v)), (9) 

vGVi 

where Vi = is the intersection of the sets of the 

per-RAU transmit power constraints (|6l). It is generally very 
difficult to solve, though there are tremendous research efforts 
on this problem Q. In particular, Liu et al. in ll^ proved 
that ^^3 is NP-hard for many common utility functions, e.g., 
weighted sum-rate. Please refer to Q Table 2.1] for details on 
classification of the convexity of utility optimization problems. 

Assume that we have the prior knowledge of SINR values 
r^,..., that can be achieved by the optimal solution to 
problem Then the optimal solution to problem ,^3^1(7) 
with target SINRs as 7 = (T^,..., T^) is an optimal solution 
to problem , 5^3 as well 123 . The difference between problem 
,^1(7) and problem 3^3 is that the SINRs in ^1(7) are 
pre-defined, while the optimal SINRs in t ^3 need to be 
searched. For the max-min fairness maximization problem, 
optimal SINRs can be searched by the bi-section method 
ll22l . which can be accomplished in polynomial time. For 
the general increasing utility maximization problem t^ 3 , the 
corresponding optimal SINRs can be searched as follows 

maximize ( 7 ( 71 ,... , 7 /s'), ( 10 ) 

■yeTZ 

where TZ S is the achievable performance region 

7^ = {(^l(v),...,^/f(v)) :ve Vi}. (ii) 

Problem (flOl) is a monotonic optimization problem ll^ and 

thus can be solved by the polyblock outer approximation 
algorithm ll^ or the branch-reduce-and-bound algorithm Q. 
The general idea of both algorithms is iteratively improving 
the lower-bound Umin and upper-bound [/max of the objective 
function of problem (fTOl) such that 

t/max - [/min < C, (12) 

for a given accuracy e in finite iterations. In particular, at the 
TO-iteration, we need to check the convex feasibility problem 
of ,! 32 ’i( 7 [”" 1 ) given the target SINRs 7[""1 = (F^™',..., F^'). 
However, the number of iterations scales exponentially with 
the number of MUs IJl. Please refer to the tutorial f?! Section 
2.3] for more details. Furthermore, the network achievable 
rate region ll^ can also be characterized by the rate profile 
method iTSl via solving a sequence of such convex feasibility 
problems ,^ 1 ( 7 ). 


D. A Unified Framework of Large-Scale Network Optimiza¬ 
tion 

In dense wireless cooperative networks, the central proces¬ 
sor can support hundreds of RAUs for simultaneously trans¬ 
mission/reception El. Therefore, all the above optimization 
problems are shifted into a new domain with a high problem 
dimension and a large number of constraints. As presented 
previously, to solve the performance optimization problems, 
we essentially need to solve a sequence of the following 
convex optimization problem with different problem instances 
(e.g., different channel realizations, network sizes and QoS 
targets) 

: minimize /(v), (13) 

vGV 

where /(v) is convex in v as shown in and ,^ 2 ( 7 )- 

Solving problem means that the corresponding algorithm 
should return the optimal solution or the certificate of infea¬ 
sibility. 

For all the problems discussed above, problem 3^ can be 
reformulated as an SOCP problem, and thus it can be solved 
in polynomial time via the interior-point method, which is 
implemented in most advanced off-the-shelf solvers, e.g., pub¬ 
lic software packages like SeDuMi ifTSl and SDPT3 ifTbll and 
commercial software packages like MOSEK El. However, 
the computational cost of such second-order methods will 
be prohibitive for large-scale problems. On the other hand, 
most custom algorithms, e.g., the uplink-downlink approach 
ISl and the ADMM based algorithms ifTTI . Il24l . EH, however, 
fail to either scale well to large problem sizes or detect the 
infeasibility effectively. 

To overcome the limitations of the scalability of the state- 
of-art solvers and the capability of infeasibility detection of 
the custom algorithms, in this paper, we propose to solve 
the homogeneous self-dual embedding ED (which aims at 
providing necessary certificates) of problem via a first- 
order optimization method ESl (i.e., the operator splitting 
method). This will be presented in Section |IV] To arrive 
at the homogeneous self-dual embedding and enable parallel 
computing, the original problem will be first transformed into 
a standard cone programming form as will be presented in 
Section m This forms the main idea of the two-stage based 
large-scale optimization framework as shown in Fig. [D 

HI. Matrix Stuffing for Fast Standard Cone 
Programming Transformation 

Although the parser/solver modeling language framework, 
like CVX ifT^ and YALMIP l|20l, can automatically transform 
the original problem instance into a standard form, it requires 
substantial time to accomplish this procedure ED. ED- In 
particular, for each problem instance, the parser/solver model¬ 
ing frameworks need to repeatedly parse and canonicalize it. 
To avoid such modeling overhead of reading problem data and 
repeatedly parsing and canonicalizing, we propose to use the 
matrix stuffing technique ED. Il22l to perform fast transfor¬ 
mation by exploiting the problem structures. Specifically, we 
will first generate the mapping from the original problem to 
the cone program, and then the structure of the standard form 
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will be stored. This can be accomplished offline. Therefore, for 
each problem instance, we only need to stuff its parameters to 
data of the corresponding pre-stored structure of the standard 
cone program. Similar ideas were presented in the emerging 
parse/generator modeling frameworks like CVXGEN 0^ and 
QCML JH], which aim at embedded applications for some 
specific problem families. In this paper, we will demonstrate 
in Section |VT] that matrix stuffing is essential to scale to large 
problem sizes for fast transformation at the first stage of the 
proposed framework. 

A. Conic Formulation of Convex Programs 

In this section, we describe a systematic way to transform 
the original problem ^ to the standard cone program. To 
enable parallel computing, a common way is to replicate some 
variables through either exploiting problem structures m, 
ED or using the consensus formulation ll25l . Il26l . How¬ 
ever, directly working on these reformulations is difficult to 
provide computable mathematical certificates of infeasibility. 
Therefore, heuristic criteria are often adopted to detect the 
infeasibility, e.g., the underlying problem instance is reported 
to be infeasible when the algorithm exceeds the pre-defined 
maximum iterations without convergence ll^ . To unify the 
requirements of parallel and scalable computing and to provide 
computable mathematical certificates of infeasibility, in this 
paper, we propose to transform the original problem 3^ to the 
following equivalent cone program 3^cone' 

3^cone '■ minimize 

subject to Ai> + fi — h (14) 

e R” X /C, (15) 

where iz € R" and G R™ are the optimization variables, 
/C = {0}’' X X • • • X 5"*" with as the standard second- 
order cone of dimension p 

5^ = {(2 /,x)gRxRP-1|||x||<2 /}, (16) 

and 5^ is defined as the cone of nonnegative reals, i.e., R+. 
Here, each S‘ has dimension m-i such that nrii) = m, 

A G R™^", b G R'", c G R". The equivalence means that 
the optimal solution or the certificate of infeasibility of the 
original problem can be extracted from the solution to the 
equivalent cone program ,^cone- To reduce the storage and 
memory overhead, we store the matrix A, vectors b and c in 
the sparse form llJTll by only storing the non-zero entries. 

The general idea of such transformation is to rewrite the 
original problem 3^ into a Smith form by introducing a new 
variable for each subexpression in disciplined convex program¬ 
ming form of problem The details are presented in 
the Appendix. Working with this transformed standard cone 
program J^cone has the following two advantages: 

• The homogeneous self-dual embedding of the primal- 
dual pair of the standard cone program can be induced, 
thereby providing certificates of infeasibility. This will be 
presented in Section IIV-AI 

• The feasible set V (|5]l formed by the intersection of a 
finite number of constraint sets Vi’s and Q^’s in the 


original problem IV can be transformed into two sets 
in 3^cone- a subspace (O and a convex cone /C, which is 
formed by the Cartesian product of second-order cones. 
This salient feature will be exploited to enable parallel 
and scalable computing, as will be presented in Section 

USD 

B. Matrix Stuffing for Fast Transformation 

Inspired by the work ED on fast optimization code deploy¬ 
ment for embedding second-order cone program, we propose 
to use the matrix stuffing technique ED, ED to transform 
the original problem into the standard cone program quickly. 
Specifically, for any given network size, we first generate 
and store the structure that maps the original problem 3^ 
to the standard form 3^cone- Thus, the pre-stored standard 
form structure includes the problem dimensions (i.e., m and 
n), the description of V (i.e., the array of the cone sizes 
[r, nil, m 2 , ■ ■ ■, fnq]), and the symbolic problem parameters A, 
b and c. This procedure can be done offline. 

Based on the pre-stored structure, for a given problem 
instance we only need to copy its parameters (i.e., the 
channel coefficients hi<-’s, maximum transmit powers Pfs, 
SINK targets 7 fe’s) to the corresponding data in the standard 
form 3^cone (i e., A and b). Details of the exact description of 
copying data for transformation are presented in the Appendix. 
As the procedure for transformation only needs to copy mem¬ 
ory, it thus is suitable for fast transformation and can avoid 
repeated parsing and generating as in parser/solver modeling 
frameworks like CVX. 

Remark 1: As shown in the Appendix, the dimension of 
the transformed standard cone program .^cone becomes m = 
{L + K) + {2NK -f 1) -f X;f=i (2 A A/ -f l)+K{2K + 2), which 
is much larger than the dimension of the original problem, 
i.e., 2NK in the equivalent real-field. But as discussed above, 
there are unique advantages of working with this standard 
form, which compensate for the increase in the size, as will 
be explicitly presented in later sections. 

IV. The Operator Splitting Method For 
Large-Scale Homogeneous Sele-Dual Embedding 

Although the standard cone program J^cone itself is suitable 
for parallel computing via the operator splitting method EH, 
directly working on this problem may fail to provide certifi¬ 
cates of infeasibility. To address this limitation, based on the 
recent work by O’Donoghue et al. ESI, we propose to solve 
the homogeneous self-dual embedding ETl of the primal-dual 
pair of the cone program 3^cone- The resultant homogeneous 
self-dual embedding is further solved via the operator splitting 
method, a.k.a. the ADMM algorithm ESI . 

A. Homogeneous Self-Dual Embedding of Cone Programming 

The basic idea of the homogeneous self-dual embedding is 
to embed the primal and dual problems of the cone program 
3^cone into a single feasibility problem (i.e., finding a feasible 
point of the intersection of a subspace and a convex set) such 
that either the optimal solution or the certificate of infeasibility 
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of the original cone program can be extracted from the 

solution of the embedded problem. 

The dual problem of i^^cone is given by ll39ll 

^cone : maximize —h^rj 

77 . A 

subject to —A^r] + A = c 

(A,J 7 )e{ 0 r x/C*, (17) 

where A G M" and rj G K™ are the dual variables, /C* is the 
dual cone of the convex cone /C. Note that /C = /C*, i.e., /C 
is self dual. Define the optimal values of the primal program 
■^cone and dual program ^cone are p* and c?*, respectively. 
Let p* = +00 and p* = —00 indicate primal infeasibility 
and unboundedness, respectively. Similarly, let d* = —00 and 
d* = +00 indicate the dual infeasibility and unboundedness, 
respectively. We assume strong duality for the convex cone 
program .^cone, i-c., p* = d*, including cases when they are 
infinite. This is a standard assumption for practically designing 
solvers for conic programs, e.g., it is assumed in ca, m, 
ini, ED, on. Besides, we do not make any regularity 
assumption on the feasibility and boundedness assumptions 
on the primal and dual problems. 

1) Certificates of Infeasibility: Given the cone program 
J^cone, a main task is to detect feasibility. In ||40l Theorem 1], a 
sufficient condition for the existence of strict feasible solution 
was provided for the transmit power minimization problem 
without power constraints. However, for the general problem 
with per-MU QoS constraints and per-RAU transmit power 
constraints, it is difficult to obtain such a feasibility condition 
analytically. Therefore, most existing works either assume that 
the underlying problem is feasible El or provide heuristic 
ways to handle infeasibility ll24l . 

Nevertheless, the only way to detect infeasibility effectively 
is to provide a certificate or proof of infeasibility as presented 
in the following proposition. 

Proposition 1: [Certificates of Infeasibility] The following 
system 


Au + p, = h, pb G 1C, (18) 

is infeasible if and only if the following system is feasible 

A^rj ^0,rjG Kf, b^r/ < 0. (19) 

Therefore, any dual variable rj satisfying the system GSll 
provides a certificate or proof that the primal program J^cone 
(equivalently the original problem is infeasible. 

Similarly, any primal variable v> satisfying the following 
system 


- Aiz e< 0, (20) 

is a certificate of the dual program infeasibility. 

Proof: This result directly follows the theorem of strong 
alternatives S Section 5.8.2]. ■ 

2 ) Optimality Conditions: If the transformed standard cone 
program iJ^cone is feasible, then {v*, p*,\*,rj*) are optimal 
if and only if they satisfy the following Karush-Kuhn-Tucker 


(KKT) conditions 


Av* + p*-h = 0 (21) 

A'^ry* - A* + c = 0 (22) 

{V*fp* = 0 (23) 

(iz*, p*, A*, J7*) e R’" X /C X {0}'* X 1C*. (24) 

In particular, the complementary slackness condition (|2^ can 
be rewritten as 

+ b'^r/* = 0, (25) 

which explicitly forces the duality gap to be zero. 

3) Homogeneous Self-Dual Embedding: We can first detect 
feasibility by Proposition [D and then solve the KKT system if 
the problem is feasible and bounded. However, the disadvan¬ 
tage of such a two-phase method is that two related problems 
(i.e., checking feasibility and solving KKT conditions) need 
to be solved sequentially ED- To avoid such inefficiency, 
we propose to solve the following homogeneous self-dual 
embedding ETII : 


Ai> + /i, — br = 0 (26) 
A^p - A -b cr = 0 (27) 
c^v + b^r/ -b K = 0 (28) 
(jz, p, A, rj, r, k) G R" X /C X {0}" x /C* x R+ x R+, (29) 


to embed all the information on the infeasibility and optimality 
into a single system by introducing two new nonnegative 
variables r and k, which encode different outcomes. The 
homogeneous self-dual embedding thus can be rewritten as 
the following compact form 


where 


,^hsd ; find (x, y) 
subject to y = Qx 

xGC,y gC*, 


A' 


■ 0 A^ c' 


V 



-A 0 b 


17 

K 


——b^ 0 


r 


y 7. 




y 


Q 

X 


(30) 


(31) 


X G R™+"+l y G Q g ]^(m-|-n-|-l)x(m-|-n-|-l) Q _ 

R” X /C* X R+ and C* = {0}” x 1C x R+. This system has a 
trivial solution with all variables as zeros. 

The homogeneous self-dual embedding problem ,^hsd is 
thus a feasibility problem finding a nonzero solution in 
the intersection of a subspace and a convex cone. Let 
(jz, p, A, T], T, k) be a non-zero solution of the homogeneous 
self-dual embedding. We then have the following remarkable 
trichotomy derived in Eli : 

• Case 1: T > 0, K = 0, then 


1 > = v/T,f) = t]/t,p = p/t (32) 

are the primal and dual solutions to the cone program 

^ cone- 

• Case 2: r = 0, K > 0; this implies c^iz-bb^rj < 0, then 
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1) If b^T 7 < 0, then f) = rj/(—b^ry) is a certificate of 
the primal infeasibility as 

A^f) = 0, G V*, b^f) = -1. (33) 

2) If c^u < 0, then i> = v/{—c^C/) is a certificate of 
the dual infeasibility as 

-Ai>G V,c^i> = -1. (34) 

• Case 3; r = K = 0; no conclusion can be made about 
the cone problem i^^cone- 

Therefore, from the solution to the homogeneous self-dual 
embedding, we can extract either the optimal solution (based 
on dhOl l) or the certificate of infeasibility for the original 
problem. Furthermore, as the set ( l29l l is a Cartesian product of 
a finite number of sets, this will enable parallelizable algorithm 
design. With the distinct advantages of the homogeneous 
self-dual embedding, in the sequel, we focus on developing 
efficient algorithms to solve the large-scale feasibility problem 
^HSD via the operator splitting method. 

B. The Operator Splitting Method 

Conventionally, the convex homogeneous self-dual embed¬ 
ding c^hsd can be solved via the interior-point method, e.g., 
ca, Qa, ini, m- However, such second-order method 
has cubic computational complexity for the second-order cone 
programs m, and thus the computational cost will be pro¬ 
hibitive for large-scale problems. Instead, O’Donoghue et al. 
II 39 I develop a first-order optimization algorithm based on the 
operator splitting method, i.e., the ADMM algorithm 12^ . to 
solve the large-scale homogeneous self-dual embedding. The 
key observation is that the convex cone constraint in ,^hsd is 
the Cartesian product of standard convex cones (i.e., second- 
order cones, nonnegative reals and free variables), which 
enables parallelizable computing. Furthermore, we will show 
that the computation of each iteration in the operator splitting 
method is very cheap and efficient. 

Specifically, the homogeneous self-dual embedding ,^hsd 
can be rewritten as 

minimize /cxC* (x,y)-f/Qx=y(x,y), (35) 

where Is is the indicator function of the set S, i.e., Is{z) 
is zero for z G S and it is -l-oo otherwise. By replicating 
variables x and y, problem (l35l l can be transformed into the 
following consensus form ll25l Section 7.1] 

^ADMM : minimize /cxc*(x,y) -f/Qi=y(x,y) 

subject to (x,y) = (x,y), (36) 

which is readily to be solved by the operator splitting method. 

Applying the ADMM algorithm ||25] Section 3.1] to prob¬ 
lem .^^admm and eliminating the dual variables by exploiting 
the self-dual property of the problem ,^hsd (Please refer to 
ll3^ Section 3] on how to simplify the ADMM algorithm), the 
final algorithm is shown as follows: 

rxt*+il = (I + Q)"^(xW -f yW) 

OS ADMM : xt'+i] = nc(k[*+il - y[*l) (37) 

^ y[i+i\ — y[j] _ x[j-|-l] _|_ 


where nc(x) denotes the Euclidean projection of x onto the 
set C. This algorithm has the 0{l/k) convergence rate Bdll 
with k as the iteration counter (i.e., the e accuracy can be 
achieved in 0{l/e) iterations) and will not converge to zero 
if a nonzero solution exists ll39l Section 3.4]. Empirically, this 
algorithm can converge to modest accuracy within a reasonable 
amount of time. As the last step is computationally trivial, in 
the sequel, we will focus on how to solve the first two steps 
efficiently. 

1) Subspace Projection via Factorization Caching: The 
first step in the algorithm OiSadmm is a subspace projection. 
After simplification 0^ Section 4], we essentially need to 
solve the following linear equation at each iteration, i.e.. 


h-1 

1 

> 


u 


'izW 

-A -I 






S X b 

for the given i/bl and ryl'l at iteration i, where S G 
with d = m -f n is a symmetric quasidefinite matrix 14211 . To 
enable quicker inversions and reduce memory overhead via 
exploiting the sparsity of the matrix S, the sparse permuted 
LDL^ factorization ll37ll method can be adopted. Specifically, 
such factor-solve method can be carried out by first computing 
the sparse permuted LDL factorization as follows 

S = PLDL^P^, (39) 

where L is a lower triangular matrix, D is a diagonal matrix 
|[38ll and P with P“^ = P^ is a permutation matrix to 
fill-in of the factorization EH, i.e., the number of nonzero 
entries in L. Such factorization exists for any permutation P, 
as the matrix S is symmetric quasidefinite ll42l Theorem 2.1]. 
Computing the factorization costs much less than C>(l/3d^) 
flops, while the exact value depends on d and the sparsity 
pattern of S in a complicated way. Note that such factorization 
only needs to be computed once in the first iteration and can 
be cached for re-using in the sequent iterations for subspace 
projections. This is called the. factorization caching technique 

OH. 

Given the cached factorization (l3^ . solving subsequent 
projections x = (l38l l can be carried out by solving 

the following much easier equations: 

Pxi = b, Lx2 = Xi, Dx3 = X2, L’^X4 = X3, P^x = X 4 ,( 40 ) 

which cost zero flops, 0{sd) flops by forward substitution with 
s as the number of nonzero entries in L, 0{d) flops, 0{sd) 
flops by backward substitution, and zero flops, respectively m 
Appendix C]. 

2 ) Cone Projection via Proximal Operator Evaluation: The 
second step in the algorithm ONadmm is to project a point u) 
onto the cone C. As C is the Cartesian product of the finite 
number of convex cones Ci, we can perform projection onto C 
by projecting onto Ci separately and in parallel. Eurthermore, 
the projection onto each convex cone can be done with closed- 
forms. Specifically, for nonnegative real Ci = R+, we have that 
El Section 6.3.1] 

nc,(u;) = a;+, (41) 









8 


where the nonnegative part operator (•)+ is taken elementwise. 
For the second-order cone Ci = {( 2 /,x) € K.xR^’“^|||x|| < y}, 
we have that ll^ Section 6.3.2] 

f 0, ||a;||2 < -T 

nci(‘^,'r) = ■{ ||u ;||2 < r (42) 

i(l/2)(l + r/||u;|| 2 )(a;,||a;|| 2 ),||a;|| 2 >|r|. 

In summary, we have presented that each step in the 
algorithm O^admm can be computed efficiently. In particular, 
from both dTO and (l42l i. we see that the cone projection can 
be carried out very efficiently with closed-forms, leading to 
parallelizable algorithms. 

V. Practical Implementation Issues 

In previous sections, we have presented the unified two- 
stage framework for large-scale convex optimization in dense 
wireless cooperative networks. In this section, we will focus 
on the implementation issues of the proposed framework. 

A. Automatic Code Generation for Fast Transformation 

In the Appendix, we describe a systematic way to transform 
the original problem to the standard cone programming form. 
The resultant structure that maps the original problem to the 
standard form can be stored and re-used for fast transforming 
via matrix stuffing. This can significantly reduce the modeling 
overhead compared with the parse/solver modeling frame¬ 
works like CVX. However, it requires tedious manual works to 
find the mapping and may not be easy to verify the correctness 
of the generated mapping. Chu et al. ED gave such an 
attempt intending to automatically generate the code for matrix 
stuffing. However, the corresponding software package QCML 
ED, so far, is far from complete and may not be suitable for 
our applications. Extending the numerical-based transforma¬ 
tion modeling frameworks like CVX to the symbolic-based 
transformation modeling frameworks like QCML is not trivial 
and requires tremendous mathematical and technical efforts. In 
this paper, we derive the mapping in the Appendix manually 
and verify the correctness by comparing with CVX through 
extensive simulations. 

B. Implementation of the Operator Splitting Algorithm 

Theoretically, the presented operator splitting algorithm 
C’^admm is compact, parameter-free, with parallelizable com¬ 
puting and linear convergence. Practically, there are typically 
several ways to improve the efficiency of the algorithm. In 
particular, there are various tricks that can be employed to im¬ 
prove the convergence rate, e.g, over-relaxation, warm-staring 
and problem data scaling as described in 1^ . In the dense 
wireless cooperative networks with multi-entity collaborative 
architecture, we are interested in two particular ways to speed 
up the subspace projection of the algorithm OiSadmm, which 
is the main computational bottleneck. Specifically, one way 
is to use the parallel algorithms for the factorization ( l39b by 
utilizing the distributed computing and memory resources Il44ll . 
For instance, in the cloud computing environments in Cloud- 
RAN, all the baseband units share the computing, memory 
and storage resources in a single baseband unit pool ||2|, 


0 . Another way is to leverage symbolic factorization (l39l) 
to speed up the numerical factorization for each problem in¬ 
stance, which is a general idea for the code generation system 
CVXGEN for realtime convex quadratic optimization ll45ll 
and the interior-point method based SOCP solver EH for 
embedded systems. Eventually, the ADMM solver in Fig. [D 
can be symbolic based so as to provide numerical solutions 
for each problem instance extremely fast and in a realtime 
way. However, this requires further investigation. 

VI. Numerical Results 

In this section, we simulate the proposed two-stage based 
large-scale convex optimization framework for performance 
optimization in dense wireless cooperative networks. The cor¬ 
responding MATLAB code that can reproduce all the simula¬ 
tion results using the proposed large-scale convex optimization 
algorithm is available onlin^ 

We consider the following channel model for the link 
between the fc-th MU and the /-th RAU; 

hki = ( 43 ) 

where L{di^i) is the path-loss in dB at distance dki as shown 
in 0 Table I], Ski is the shadowing coefficient, pki is the 
antenna gain and iki is the small-scale fading coefficient. We 
use the standard cellular network parameters as showed in 0 
Table I]. All the simulations are carried out on a personal 
computer with 3.2 GHz quad-core Intel Core i5 processor and 
8 GB of RAM running Linux. The reference implementation 
of the operator splitting algorithm SCS is available onlinfl 
which is a general software package for solving large-scale 
convex cone problems based on ll^ and can be called by the 
modeling frameworks CVX and CVXPY HTl . The settings 
(e.g., the stopping criteria) of SCS can be found in EH. 

The proposed two-stage approach framework, termed “Ma¬ 
trix Stuffing-i-SCS”, is compared with the following state-of- 
art frameworks; 

> CVX-i-SeDuMi/SDPT3/MOSEK; This category adopts 
second-order methods. The modeling framework CVX 
will first automatically transform the original problem 
instance (e.g., the problem written in the disciplined 
convex programming form) into the standard cone pro¬ 
gramming form and then call an interior-point solver, e.g., 
SeDuMi Ea, SDPT3 IM or MOSEK ET). 

• CVX-i-SCS; In this first-order method based framework, 
CVX first transforms the original problem instance into 
the standard form and then calls the operator splitting 
solver SCS. 

We define the “modeling time” as the transformation time 
for the first stage, the “solving time” as the time spent for the 
second stage, and the “total time” as the time of the two stages 
for solving one problem instance. As the large-scale convex 
optimization algorithm should scale well to both the modeling 
part and the solving part simultaneously, the time comparison 
of each individual stage will demonstrate the effectiveness of 
the proposed two-stage approach. 

^https://github.com/SHIYUANMING/large-scale-convex-optimization 

^https://github.com/cvxgrp/scs 
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Given the network size, we first generate and store the 
problem structure of the standard form ^cone, i-e., the structure 
of A, b, c and the descriptions of V. As this procedure can 
be done offline for all the candidate network sizes, we thus 
ignore this step for time comparison. We repeat the following 
procedures to solve the large-scale convex optimization prob¬ 
lem with different parameters and sizes using the proposed 
framework “Matrix Stuffing-i-SCS”; 

1) Copy the parameters in the problem instance ^ to the 
data in the pre-stored structure of the standard cone 
program ^^ons- 

2) Solve the resultant standard cone programming instance 
■^cone using the solver SCS. 

3) Extract the optimal solutions of ^ from the solutions 
to ^cone by the solver SCS. 

Finally, note that all the interior-point solvers are multiple 
threaded (i.e., they can utilize multiple threads to gain extra 
speedups), while the operator splitting algorithm solver SCS 
is single threaded. Nevertheless, we will show that SCS 
performs much faster than the interior-point solvers. We also 
emphasize that the operator splitting method aims at scaling 
well to large problem sizes and thus provides solutions to 
modest accuracy within reasonable time, while the interior- 
point method intends to provide highly accurate solutions. 
Furthermore, the modeling framework CVX aims at rapid 
prototyping and providing a user-friendly tool for automati¬ 
cally transformations for general problems, while the matrix¬ 
stuffing technique targets at scaling to large-scale problems for 
the specific problem family Therefore, these frameworks 
and solvers are not really comparable with different purposes 
and application capabilities. We mainly use them to verify 
the effectiveness and reliability of our proposed framework in 
terms of the solution time and the solution quality. 

A. Effectiveness and Reliability of the Proposed Large-Scale 
Convex Optimization Framework 

Consider a network with L 2-antenna RAUs and K si^le- 
antenna MUs uniformly and independently distributecO in 
the square region [—3000,3000] x [—3000,3000] meters with 
L = K. We consider the total transmit power minimization 
problem <^1(7) with the QoS requirements for each MU as 
7fc = 5 dB, Vfc. Table |T] demonstrates the comparison of the 
running time and solutions using different convex optimization 
frameworks. Each point of the simulation results is averaged 
over 100 randomly generated network realizations (i.e., one 
small scaling fading realization for each large-scale fading 
realization). 

For the modeling time comparisons, this table shows that the 
value of the proposed matrix stuffing technique ranges between 
0.01 and 30 second^ for different network sizes and can 
speedup about 15x to 60x compared to the parser/solver mod¬ 
eling framework CVX. In particular, for large-scale problems, 

'^Consider the CSI acquisition overhead, our proposed approach is mainly 
suitable in the low user mobility scenarios. 

^This value can be significantly reduced in practical implementations, e.g., 
at the BBU pool in Cloud-RAN, which, however, requires substantial further 
investigation. Meanwhile, the results effectively confirm that the proposed 
matrix stuffing technique scales well to large-scale problems. 


the transformation using CVX is time consuming and becomes 
the bottleneck, as the “modeling time” is comparable and even 
larger than the “solving time”. For example, when L = 150, 
the “modeling time” using CVX is about 3 minutes, while the 
matrix stuffing only requires about 10 seconds. Therefore, the 
matrix stuffing for fast transformation is essential for solving 
large-scale convex optimization problems quickly. 

For the solving time (which can be easily calculated by 
subtracting the “modeling time” from the “total time”) using 
different solvers, this table shows that the operator splitting 
solver can speedup by several orders of magnitude over the 
interior-point solvers. For example, for L = 50, it can speedup 
about 20x and 130x over MOSEK@ and SDPT3, respectively, 
while SeDuMi is inapplicable for this problem size as the 
running time exceeds the pre-defined maximum value, i.e., 
one hour. In particular, all the interior-point solvers fail to 
solve large-scale problems (i.e., L = 100,150, 200), denoted 
as “N/A”, while the operator splitting solver SCS can scale 
well to large problem sizes. For the largest problems with 
L = 200, the operator splitting solver can solve them in about 
5 minutes. 

For the quality of the solutions, this table shows that the 
propose framework can provide a solution to modest accuracy 
within much less time. For the two problem sizes, i.e., L = 20 
and L = 50, which can be solved by the interior-point method 
based frameworks, the optimal values attained by the proposed 
framework are within 0.03% of that obtained via the second- 
order method frameworks. 

In summary, the proposed two-stage based large-scale con¬ 
vex optimization framework scales well to large-scale problem 
modeling and solving simultaneously. Therefore, it provides an 
effective way to evaluate the system performance via large- 
scale optimization in dense wireless networks. However, its 
implementation and performance in practical systems still need 
further investigation. In particular, this set of results indicate 
that the scale of cooperation in dense wireless networks may 
be fundamentally constrained by the computation complex¬ 
ity/time. 

B. Infeasibility Detection Capability 

A unique property of the proposed framework is its infeasi¬ 
bility detection capability, which will be verified in this part. 
Consider a network with L = 50 single-antenna RAUs and 
K = 50 single-antenna MUs uniformly and independently 
distributed in the square region [—2000, 2000] x [—2000, 2000] 
meters. The empirical probabilities of feasibility in Fig. [3 
show that the proposed framework can detect the infeasibility 
accurately compared with the second-order method frame¬ 
work “CVX-I-SDPT3” and the first-order method framework 
“CVX-i-SCS”. Each point of the simulation results is av¬ 
eraged over 200 randomly generated network realizations. 
The average (“total time”, “solving time”) for obtaining a 
single point with “CVX-I-SDPT3”, “CVX-i-SCS” and “Matrix 

^Although SeDuMi, SDPT3 and MOSEK (commercial software) are all 
based on the interior-point method, the implementation efficiency of the corre¬ 
sponding software packages varies substantially. In the following simulations, 
we mainly compare with the state-of-art public solver SDPT3. 
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TABLE I 

Time and Solution Results for Different Convex Optimization Frameworks 


Network Size (L = K) 


20 

50 

100 

150 

200 

CVX+SeDuMi 

Total Time [sec] 

8.1164 

N/A 

N/A 

N/A 

N/A 

Objective [W] 

12.2488 

N/A 

N/A 

N/A 

N/A 

CVX+SDPT3 

Total Time [sec] 

5.0398 

330.6814 

N/A 

N/A 

N/A 

Objective [W] 

12.2488 

6.5216 

N/A 

N/A 

N/A 

CVX+MOSEK 

Total Time [sec] 

1.2072 

51.6351 

N/A 

N/A 

N/A 

Objective [W] 

12.2488 

6.5216 

N/A 

N/A 

N/A 


Total Time [sec] 

0.8501 

5.6432 

51.0472 

227.9894 

725.6173 

CVX+SCS 

Modeling Time [sec] 

0.7563 

4.4301 

38.6921 

178.6794 

534.7723 


Objective [W] 

12.2505 

6.5215 

3.1303 

2.0693 

1.5404 


Total Time [sec] 

0.1137 

2.7222 

26.2242 

90.4190 

328.2037 

Matrix Stuffing+SCS 

Modeling Time [sec] 

0.0128 

0.2401 

2.4154 

9.4167 

29.5813 


Objective [W] 

12.2523 

6.5193 

3.1296 

2.0689 

1.5403 



Fig. 3. The empirical probability of feasibility versus target SINR with 
different network sizes. 

Stuffing+SCS” are (101.7635, 99.1140) seconds, (5.0754, 
2.3617) seconds and (1.8549, 1.7959) seconds, respectively. 
This shows that the operator splitting solver can speedup about 
50x over the interior-point solver. 

We further consider a larger-sized network with L = 100 
single-antenna RAUs and K = 100 single-antenna MUs 
uniformly and independently distributed in the square region 
[—2000, 2000] X [—2000, 2000] meters. As the second-order 
method framework fails to scale to this size, we only compare 
with the first-order method framework. Fig. [3 demonstrates 
that the proposed framework has the same infeasibility de¬ 
tection capability as the first-order method framework. This 
verifies the correctness and the reliability of the proposed 
fast transformation via matrix stuffing. Each point of the 
simulation results is averaged over 200 randomly generated 
network realizations. The average (“solving time”, “modeling 
time”) for obtaining a single point with “CVX-i-SCS” and 
“Matrix Stuffing-i-SCS” are (41.9273,18.6079) seconds and 
(31.3660,0.5028) seconds, respectively. This shows that the 
matrix stuffing technique can speedup about 40x over the 
numerical based parser/solver modeling framework CVX. We 
also note that the solving time of the proposed framework is 
smaller than the framework “CVX-i-SCS”, the speedup is due 
to the warm-staring ll^ Section 4.2]. 



Target SINR [dB] 

Fig. 4. Average normalized network power consumption (i.e., the obtained 
optimal total network power consumption over the maximum network power 
consumption with all the RAUs active and full power transmission) versus 
target SINR with different network sizes. 

C. Group Sparse Beamforming for Network Power Minimiza¬ 
tion 

In this part, we simulate the network power minimization 
problem using the group sparse beamforming algorithm O 
Algorithm 2]. We set each fronthaul link power consumption 
as 5.6W and set the power amplifier efficiency coefficient 
for each RAU as 25%. In this algorithm, a sequence of 
convex feasibility problems need to be solved to determine 
the active RAUs and one convex optimization problem needs 
to be solved to determine the transmit beamformers. This 
relies on the infeasibility detection capability of the proposed 
framework. 

Consider a network with L = 20 2-antenna RAUs and 
K = AO single-antenna MUs uniformly and independently 
distributed in the square region [—1000,1000] x [—1000,1000] 
meters. Each point of the simulation results is averaged over 
50 randomly generated network realizations. Eig. |4] demon¬ 
strates the accuracy of the solutions in the network power 
consumption obtained by the proposed framework compared 
with the second-order method framework “CVX-I-SDPT3” and 
the first-order method framework “CVX-hSCS”. The average 
(“total time”, “solving time”) for obtaining a single point with 
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Fig. 5. The minimum network-wide achievable versus transmit SNR with 
55 single-antenna RAUs and 50 single-antenna MUs. 

“CVX+SDPT3”, “CVX+SCS” and “Matrix Stuffing+SCS” 
are (48.6916, 41.0316) seconds, (9.4619, 1.7433) seconds 
and (2.4673, 2.3061) seconds, respectively. This shows that 
the operator splitting solver can speedup about 20x over the 
interior-point solver. 

We further consider a larger-sized network with L = 50 2- 
antenna RAUs and K = 50 single-antenna MUs uniformly and 
independently distributed in the square region [—3000,3000] x 
[—3000, 3000] meters. As the second-order method framework 
is not applicable to this problem size, we only compare with 
the first-order method framework. Each point of the simula¬ 
tion results is averaged over 50 randomly generated network 
realizations. Fig. |4] shows that the proposed framework can 
achieve the same solutions in network power consumption as 
the first-order method framework “CVX-i-SCS”. The average 
(“solving time”, “modeling time”) for obtaining a single 
point with “CVX-i-SCS” and “Matrix Stuffing-i-SCS” are 
(11.9643,69.0520) seconds and (14.6559,2.1567) seconds, 
respectively. This shows that the matrix stuffing technique 
can speedup about 30x over the numerical based parser/solver 
modeling framework CVX. 

In summary. Fig. 0] demonstrates the capability of infeasi¬ 
bility detection (as a sequence of convex feasibility problems 
need to be solved in the RAU selection procedure), the accu¬ 
racy of the solutions, and speedups provided by the proposed 
framework over the existing frameworks. 

D. Max-min Rate Optimization 

We will simulate the minimum network-wide achievable 
rate maximization problem using the max-min fairness op¬ 
timization algorithm in ll22l Algorithm 1] via the bi-section 
method, which requires to solve a sequence of convex fea¬ 
sibility problems. We will not only show the quality of the 
solutions and speedups provided by the proposed framework, 
but also demonstrate that the optimal coordinated beamform- 
ers signihcantly outperform the low-complexity and heuristic 
transmission strategies, i.e., zero-forcing beamforming (ZFBF) 
ll48l . regularized zero-forcing beamforming (RZF) ll^ 
and maximum ratio transmission (MRT) ifSOl . 

Consider a network with L = 55 single-antenna RAUs and 
K = 50 single-antenna MUs uniformly and independently 


distributed in the square region [—5000, 5000] x [—5000, 5000] 
meters. Fig. [5] demonstrates the minimum network-wide 
achievable rate versus different SNRs (which is dehned as 
the transmit power at all the RAUs over the receive noise 
power at all the MUs) using different algorithms. Each point of 
the simulation results is averaged over 50 randomly generated 
network realizations. For the optimal beamforming, this hgure 
shows the accuracy of the solutions obtained by the pro¬ 
posed framework compared with the hrst-order method frame¬ 
work “CVX-I-SCS”. The average (“solving time”, “modeling 
time”) for obtaining a single point for the optimal beam¬ 
forming with “CVX-I-SCS” and “Matrix Stuffing-i-SCS” are 
(176.3410, 55.1542) seconds and (82.0180, 1.2012) seconds, 
respectively. This shows that the proposed framework can 
reduce both the solving time and modelling time via warm¬ 
starting and matrix stuffing, respectively. 

Furthermore, this hgure also shows that the optimal beam¬ 
forming can achieve quite an improvement for the per-user rate 
compared to suboptimal transmission strategies RZF, ZFBF 
and MRT, which clearly shows the importance of developing 
optimal beamforming algorithms for such networks. The aver¬ 
age (“solving time”, “modeling time”) for a single point using 
“CVX-I-SDPT3” for the RZF, ZFBF and MRT are (2.6210, 
30.2053) seconds, (2.4592, 30.2098) seconds and (2.5966, 
30.2161) seconds, respectively. Note that the solving time 
is very small, which is because we only need to solve a 
sequence of linear programming problems for power control 
when the directions of the beamformers are hxed during the 
bi-section search procedure. The main time consuming part is 
from transformation using CVX. 

VII. Conclusions and Further Works 

In this paper, we proposed a unihed two-stage framework 
for large-scale optimization in dense wireless cooperative 
networks. We showed that various performance optimiza¬ 
tion problems can be essentially solved by solving one or 
a sequence of convex optimization or feasibility problems. 
The proposed framework only requires the convexity of the 
underlying problems (or subproblems) without any other 
structural assumptions, e.g., smooth or separable functions. 
This is achieved by hrst transforming the original convex 
problem to the standard form via matrix stuffing and then 
using the ADMM algorithm to solve the homogeneous self¬ 
dual embedding of the primal-dual pair of the transformed 
standard cone program. Simulation results demonstrated the 
infeasibility detection capability, the modeling flexibility and 
computing scalability, and the reliability of the proposed 
framework. 

In principle, one may apply the proposed framework to 
any large-scale convex optimization problems and only needs 
to focus on the standard form reformulation as shown in 
Appendix, as well as to compute the proximal operators for 
different cone projections in (l42l i. However, in practice, we 
need to address the following issues to provide a user-friendly 
framework and to assist practical implementation: 

> Although the parse/solver frameworks like CVX can 
automatically transform an original convex problem into 
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the standard form numerically based on the graph imple¬ 
mentation, extending such an idea to the automatic and 
symbolic transformation, thereby enabling matrix stuffing, 
is desirable but challenging in terms of reliability and 
correctness verification. 

• Efficient projection algorithms are highly desirable. For 
the subspace projection, as discussed in Section IV-BI 
parallel factorization and symbolic factorization are espe¬ 
cially suitable for the cloud computing environments as 
in Cloud-RAN Q, ID. For the cone projection, although 
the projection on the second-order cone is very efficient, 
as shown in dH, projecting on the semidefinite cone 
(which is required to solve the semidefinite programming 
problems) is computationally expensive, as it requires to 
perform eigenvalue decomposition ifTSll . The structure of 
the cone projection should be exploited to make speedups. 

• It is interesting to apply the proposed framework to 
various non-convex optimization problems. For instance, 
the well-known majorization-minimization optimization 
provides a principled way to solve the general non-convex 
problems, whereas a sequence of convex subproblems 
need to be solved at each iteration. Enabling scalable 
computation at each iteration will hopefully lead to 
scalability of the overall algorithm. 


where Gi{l) is the Smith form reformulation for the transmit 
power constraint for RAU I dH) as follows 

f ( 2/0 >yl) e 

01(0 2/0 = (47) 


and G 2 {k) is the Smith form reformulation for the QoS 
constraint for MU k (1451) as follows 


02 (fc) : 


to = G R 

tj =t^ G 
= CfcV G 
= gfc G R^+^ 


(48) 


Nevertheless, the Smith form reformulation (l46l) is not 
convex due to the non-convex constraint ||xi|| = a:o. We thus 
relax the non-convex constraint as ||xi|| < xq, yielding the 
following relaxed Smith form 


where 


minimize xq 

subject to Go,Giil),G 2 ik),Vk,l, 


J (xo,xi) G 
\xi = V G 


(49) 

(50) 


Appendix 

Conic Formulation for Convex Programs 

We shall present a systematic way to transform the original 
problem to the standard convex cone programming form. We 
first take the real-field problem 23^ with the objective function 
/(x) = ||v||2 as an example. At the end of this subsection, 
we will show how to extend it to the complex-field. 

According to the principle of the disciplined convex pro¬ 
gramming m, the original problem ^ can be rewritten as 
the following disciplined convex programming form Go) 


2 ^cvx : minimize ||v||2 

subject to \/Wi,l = l,...,L (44) 

||CfcV-l-gfc||2 < ^fcrfcV,/c = 1,...,A:,(45) 
where D; = blkdiagjD^^,..., G RMtcxATiT 

G 


Df 


^NixEf=i+i Ni 


ixN 


/?fc — + l/7fc, Tfe — 0^_i)7V ’1 

Sk = [0^1 S and = [Ck,0NK]'^ £ 

^(K+i)xNK ^ blkdiag{hfc,..., hfc} G It 

is thus easy to check the convexity of problem 23^cvx, following 
the disciplined convex programming ruleset m- 


A. Smith Form Reformulation 

To arrive at the standard convex cone program 2^cone, we 
rewrite problem 2^cvx as the following Smith form ll2^ by 
introducing a new variable for each subexpression in 23^cvx, 

minimize Xg 

subject to ||xi|| = a;o,xi = v 

Gi{l),G2{k),yk,l, 


It can be easily proved that the constraint ||xi|| < xg has to be 
active at the optimal solution; otherwise, we can always scale 
down Xg such that the cost function can be further minimized 
while still satisfying the constraints. Therefore, we conclude 
that the relaxed Smith form (|49] | is equivalent to the original 
problem 

B. Conic Reformulation 

Now, the relaxed Smith form reformulation (|4^ is readily 
to be reformulated as the standard cone programming form 
■^cone- Specifically, define the optimization variables [xqIv] 
with the same order of equations as in Go, then Go can be 
rewritten as 


M[xg]v]+fig = in, (51) 

where the slack variables belong to the following convex set 
Fo e (52) 


and M G ]R(tVR'-i-i)x(tVA'+i) g 

follows 


M = 


-1 



— liVJf _ 


,m = 


Onk 


are given as 


(53) 


respectively. Define the optimization variables [t/g; v] with the 
same order of equations as in Gi (1), then Gi (1) can be rewritten 
as 


P'[22o;v] + M =p', (54) 

where the slack variables G R^^*+^ belongs to the 
following convex set formed by the Cartesian product of two 
convex sets 


(46) 


G X 


(55) 
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and P* G ]^(J£'Wi+2)x(Afif+i) pi ^ ^kNi+2 given as 
follows 



1 



\VPi] 


p' = 

-1 

-D/ 

,p' = 

0 

OifAT, 

, (56) 


respectively. Define the optimization variables [fg; v] with the 
same order of equations as in G 2 {k), then G 2 {k) can be 
rewritten as 


Q^[^g;v]+/r^ = q^ (57) 


where the slack variables belong to the following 

convex set formed by the Cartesian product of two convex sets 


/X2 e S' X Q 


K-\-2 


(58) 


and G and G are given as 

follows 


£) 

II 

1 

1 

_1 


■ 0 ■ 

-1 


,q'= = 

0 



-Cfc 


gfc 


(59) 


respectively. 

Therefore, we arrive at the standard form ^cone by writing 
the optimization variables iz G R” as follows 


ones, and A and b are given as follows; 


A = 



1 

1 





1 

1 

-/3irf 

-1 



— ^NK 


-1 


-Di 






-1 


Q 

1 



-1 

1 

n 







-1 

-Ck _ 


,b = 


VPi' 


0 

Onk 


0 

OkNi 


0 

OkNi 


0 

gi 


0 

Sk 


,(64) 


respectively. 


C. Matrix Stuffing 

Given a problem instance to arrive at the standard cone 
program form, we only need to copy the parameters of the 
maximum transmit power P/’s to the data of the standard form, 
i.e., v^’s in b, copy the parameters of the SINR thresholds 
7 to the data of the standard form, i.e., ^^’s in A, and copy 
the parameters of the channel realizations hfc’s to the data of 
the standard form, i.e., r/t’s and C^’s in A. As we only need 
to perform copying the memory for the transformation, this 
procedure can be very efficient compared to the state-of-the- 
art numerical based modeling frameworks like CVX. 


^ (60) 

D. Extension to the Complex Case 


and c = The structure of the standard cone pro¬ 
gramming is characterized by the following data 


n=l + L + K + NK, (61) 

L 

m = {L + K) + {NK -f 1) + '^{KNi + 1) + K{K + 2),(62) 

1^1 


/c = X • • • X X X X • • • X 


L-\-K 

X ••• X Q^+^ 
K 


(63) 


where /C is the Cartesian product of 2(L-|-A)-|-l second-order 


For hfc G G C'^, we have 


■in(hfc) -a(hfc)' 

T 

■in(v,) ■ 

(I(hfc) IH(hfc) 






hfc 


(65) 


where hfc G R^^^^ and V/ G R^^. Therefore, the complex- 
field problem can be changed into the real-field problem by 
the transformations: hfc ^ hfc and v/ v/. 
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